Background

The relationship between mental health and education level has been studied extensively, with a number of theories emerging on how the two interact. Correlative trends have been observed in PhD students, possibly owing to increased stress (1.). While other studies have shown a significantly reduced incidence of mental illness in educated individuals, hypothesising a protective effect (2.).

This report aims to explore the relationship between antidepressant prescriptions and education level across Scotland, using Public Health Scotland prescribing data, and data from Scotland’s most recent census in 2022. The prescribing data will be taken from January to August 2025 and a monthly average calculated, minimising any recording of repeat prescriptions following the NHS’ prescribing recommendations indicating an optimal 28-day repeat prescription duration (3, 4.).

Part 1: Exploring the prescribing rates

Prescribing data can be found here: https://www.opendata.nhs.scot/dataset/prescriptions-in-the-community

library(tidyverse)
library(stringr)
library(janitor)
library(here)
library(ggrepel)
library(readr)
library(dplyr)
library(gt)
library(plotly)
library(forcats)
library(scales)

files <- list.files(here("data","consecutive_data"), pattern = "csv", full.names = TRUE) # 8 months of prescribing data through 2025
months <- seq(as.Date("2025-01-01"), as.Date("2025-08-01"), by = "month")
all_prescribing <- map2_df(files, months, ~ read_csv(.x, 
                                                     col_types = cols(DMDCode = col_character(), # ensure DMD codes are all characters, as some are read as numbers 
                                                                      .default = col_guess()), show_col_types = FALSE) %>%
    mutate(month = .y)) %>% 
  clean_names() %>% 
  select(hbt, gp_practice, bnf_item_code, number_of_paid_items, month)

First, we need to find all the antidepressants being prescribed in Scotland. This is made possible by the inclusion of BNF item codes in the Public Health Scotland prescribing data set. By identifying the relevant codes for antidepressants we can filter the data set down to just the drugs we want to look at.

In order to identify the code for antidepressants, we can look at the BNF code information provided by the NHS Business Services Authority and filter the data set down to just antidepressants, then calculate average monthly prescribing numbers for each health board.

BNF information is available at: https://opendata.nhsbsa.net/dataset/bnf-code-information-current-year

BNF_info <- read_csv(here("data/bnf_code_current_202506_version_88.csv")) %>% 
  filter(BNF_SECTION == "Antidepressant drugs")  # filter the data to show just the antidepressants
unique(BNF_info$BNF_SECTION_CODE)  # 0403 is the code we use to focus our prescribing data
## [1] "0403"
antidepressants <- all_prescribing %>% 
  filter(str_starts(bnf_item_code, "0403"), hbt != "SB0806") # remove special health board as we filter for antidepressants

antidepressants_avg <- antidepressants %>% 
  group_by(hbt, month) %>% 
  summarise(total_items = sum(number_of_paid_items, na.rm = TRUE), .groups = "drop") %>%  # summing the number of monthly antidepressant prescriptions for each health board
  group_by(hbt) %>% 
  summarise(mean_items = mean(total_items, na.rm = TRUE), .groups = "drop") # prescription averages for each health board across the 8 months

Next, let’s add some names for the health boards to our data to allow for clearer visualisation down the line, and we can get an idea of the range of total prescriptions across Scotland. Using the flexible table builder on the Scottish census website, I have created a table with the numbers for the highest level of education (columns) achieved by healthboard (rows). We can also join this with the data for antidepressants, using the healthboard names.

Health board names available at: https://www.opendata.nhs.scot/dataset/geography-codes-and-labels/resource/652ff726-e676-4a20-abda-435b98dd7bdc Census table builder is available at: https://www.scotlandscensus.gov.uk/search-the-census#/

hb_names <- read_csv("https://www.opendata.nhs.scot/dataset/9f942fdb-e59e-44f5-b534-d6e17229cc7b/resource/652ff726-e676-4a20-abda-435b98dd7bdc/download/hb14_hb19.csv")

data_antidepressants <- hb_names %>% 
  inner_join(antidepressants_avg, by = c("HB" = "hbt")) %>% 
  clean_names() %>% 
  select(!c(hb_date_enacted, hb_date_archived, country)) %>% 
  mutate(hb_name = gsub("^NHS ", "", hb_name)) # removes NHS prefix for each health board to match the naming conventions in the census data

# we have to load the census data as raw text to extract just the information we need
raw_lines <- read_lines(here("data/education_level_healthboard.csv"))
info <- raw_lines[10:25] # keep only the headers and the data (lines 10–25), also excluding the extra total row we don't need
info <- info[-2] # remove the extra header line

census_data22 <- read_csv(
  I(info), # feed character vector as a connection
  col_names = TRUE, show_col_types = FALSE) %>%
  clean_names() %>%
  rename(hb = highest_level_of_qualification,
         NA_or_under_16 = not_applicable_aged_less_than_16,
         none = no_qualifications,
         lower_school = lower_school_qualifications,
         upper_school = upper_school_qualifications,
         further_education_sub_degree = further_education_and_sub_degree_higher_education_qualifications_incl_hnc_hn_ds,
         degree = degree_level_qualifications_or_above_education_qualifications_not_already_mentioned_including_foreign_qualifications,
         apprenticeship = apprenticeship_qualifications) %>% 
  remove_empty("cols") 
  
education_antidepressants <- data_antidepressants %>% 
  full_join(census_data22, by = c("hb_name" = "hb"))

# We need to show our flat numbers as proportions to properly gauge the relative prescribing rate of antidepressants. Luckily, we have a "total" column from the census data that shows us the population of each healthboard, so we can create a new column with the antidepressant prescriptions per capita from it
education_antidepressants <- education_antidepressants %>% 
  mutate(apc = mean_items / total, degree_pct = degree/total*100) # total prescriptions per capita and percentage of degree holders in each healthboard

Prescribing rates

Now we have all the data ready to investigate the prescribing rates of the health boards.

education_antidepressants %>% 
  select(hb_name, mean_items, total, apc) %>% 
  arrange(desc(apc)) %>% 
  gt(
    rowname_col   = "hb_name"
  ) %>% 
  tab_stubhead(label = "Health board") %>% 
  cols_label(
    mean_items = "Average monthly items",
    total      = "Population",
    apc        = "Prescriptions per capita"
  ) %>% 
  tab_spanner(
    label = "Prescribing metrics",
    columns = c(mean_items, apc)
  ) %>% 
  fmt_number(
    columns = c(mean_items, apc),
    decimals = 2
  ) %>% 
  fmt_number(
    columns = total,
    decimals = 0,
    use_seps = TRUE
  ) %>% 
  grand_summary_rows(
    columns = c(mean_items, apc),
    fns = list(
      "Mean" = ~ mean(.x, na.rm = TRUE)
    )
  ) %>% 
  tab_source_note(
    source_note = "Data: Public Health Scotland; Prescriptions in the Community"
  ) %>% 
  tab_style(
  style = list(
    cell_fill(color = "#e5e5e5"),
    cell_text(weight = "bold")
  ),
  locations = cells_column_labels(everything())
  ) %>%
  tab_style(
  style = cell_text(align = "left"),
  locations = list(
    cells_column_labels(everything()),
    cells_body(columns = everything())
  )) %>% 
  tab_header(
    title = "Antidepressant Prescribing by Scottish Health Board",
    subtitle = "Prescription data from January to August 2025"
  ) %>% 
  tab_options(
    table.width = pct(80),
    heading.title.font.size = 24,
    heading.subtitle.font.size = 14,
    column_labels.font.weight = "bold"
  )
Antidepressant Prescribing by Scottish Health Board
Prescription data from January to August 2025
Health board
Prescribing metrics
Population
Average monthly items Prescriptions per capita
Dumfries and Galloway 24,197.25 0.17 145,895
Ayrshire and Arran 55,730.25 0.15 365,256
Lanarkshire 99,087.38 0.15 668,027
Greater Glasgow and Clyde 166,587.62 0.14 1,177,213
Forth Valley 41,738.50 0.14 302,784
Western Isles 3,587.50 0.14 26,140
Fife 49,578.62 0.13 371,781
Borders 15,569.50 0.13 116,821
Shetland 2,997.12 0.13 22,986
Tayside 51,971.12 0.13 413,992
Highland 37,482.75 0.12 321,321
Orkney 2,526.00 0.12 21,958
Grampian 62,793.38 0.11 581,040
Lothian 93,846.25 0.10 904,628
Mean 50549.52 0.1321014
Data: Public Health Scotland; Prescriptions in the Community

Results

The table shows the average monthly antidepressant prescriptions per capita from January to August 2025 in all 14 Scottish health boards. Dumfries and Galloway is the clear leader in prescribing rate, with a relatively noticeable gap between it and Ayrshire and Arran following in second, while Lothian has the lowest rate of the lot, but not too far removed from the other boards towards the bottom like Grampian, Orkney and Highland. The total range of prescriptions per capita is noticeable, with the lowest at 0.1 and the highest at 0.17, it’s almost a 70% increase. This indicates meaningful geographical variation in antidepressant prescribing across Scotland, with many factors possibly contributing to the inequalities, such as: local policy, socioeconomic disparities, and baseline population characteristic differences.

Part 2: Exploring education levels

Now, using the same census data, we should have a look at the education levels of each health board relative to each other, so we can get an idea of the landscape of this variable.

education_data <- education_antidepressants %>% 
  pivot_longer(
    c(NA_or_under_16:degree),
    names_to  = "edu_lvl",
    values_to = "edu_lvl_count"
  ) %>% 
  mutate(
    edu_lvl_pct = edu_lvl_count / total, edu_lvl = factor(edu_lvl, levels = rev(c( # reverse order of census progression for better visualisation
        "NA_or_under_16",
        "none",
        "lower_school",
        "upper_school",
        "apprenticeship",
        "further_education_sub_degree",
        "degree"
      ))
    ),
    edu_lvl = fct_recode( # plotly doesn't adopt legend labels set in ggplot so have to label the levels properly here
      edu_lvl,
      "Under 16 / N.A."                = "NA_or_under_16",
      "No qualifications"              = "none",
      "Lower school"                   = "lower_school",
      "Upper school"                   = "upper_school",
      "Apprenticeship"                 = "apprenticeship",
      "Further education (sub-degree)" = "further_education_sub_degree",
      "Degree or above"                = "degree"
    )
  )

graph1 <- education_data %>% 
  ggplot(aes(
    x    = edu_lvl_pct,
    y    = reorder(hb_name, degree_pct), 
    fill = edu_lvl,
    text = paste(
      "Education level:", edu_lvl,
      "<br>Percentage:", percent(edu_lvl_pct, accuracy = 0.1)
    )
  )) +
  geom_col(colour = "grey20", linewidth = 0.2) +
  scale_x_continuous(labels = percent_format(accuracy = 1)) +
  scale_fill_manual(
    name   = "Highest level of qualification",
    values = c(
      "Under 16 / N.A."                = "#f2e5ff",
      "No qualifications"              = "#d8b7ff",
      "Lower school"                   = "#be89ff",
      "Upper school"                   = "#a35cff",
      "Apprenticeship"                 = "#8930ff",
      "Further education (sub-degree)" = "#7013e6",
      "Degree or above"                = "#4d0099"
    )
  ) +
  labs(
    x = "Percentage of health board population",
    y = "Health board",
    title = "Education level distribution by Scottish health board",
  ) +
  theme(legend.position = "right",
      legend.title = element_text(size = 9),
      legend.text  = element_text(size = 8))
ggplotly(graph1, tooltip = "text")

Results

Part 3: Investigating the relationship

We’re now ready to look into the relationship between antidepressant prescriptions and education level. We’ll do this by using the percentage of each health board’s population that are degree holders as calculated earlier, and running a linear regression to determine the correlation between the two variables.

library(tidymodels)

lm_model <- linear_reg() %>% 
  set_engine('lm') %>% 
  set_mode("regression") %>% 
  fit(apc ~ degree_pct, data = education_antidepressants)
new_data <- tibble(degree_pct = seq(min(education_antidepressants$degree_pct),
                   max(education_antidepressants$degree_pct),
                   length.out = 100))

pred_data <- predict(lm_model, new_data = new_data) %>% 
  bind_cols(new_data)

lm_fit <- lm(apc ~ degree_pct, data = education_antidepressants)

ci_pred <- as_tibble(predict(lm_fit, newdata = new_data, interval = "confidence")) %>%
  bind_cols(new_data)

graph2 <- plot_ly(
  education_antidepressants,
  x = ~degree_pct,
  y = ~apc,
  type = "scatter",
  mode = "markers",
  marker = list(
    color   = "#061A61",
    size    = 15,
    opacity = 0.8
  ),
  alpha = 0.65,
  name = "Health boards",
  text = ~hb_name,
  hoverinfo = "text+x+y"
  ) %>%
  add_ribbons(
    data = ci_pred,
    x = ~degree_pct,
    ymin = ~lwr,
    ymax = ~upr,
    name = "95% CI",
    showlegend = FALSE,
    hoverinfo = "none",
    inherit = FALSE,
    fillcolor = "rgba(6, 26, 97, 0.15)",
    line = list(color = "rgba(0,0,0,0)")
  ) %>%
  add_lines(
    data = pred_data,
    x = ~degree_pct,
    y = ~.pred,
    name = "Regression",
    hoverinfo = "none",
    inherit = FALSE,
    line = list(width = 3, color = "#D92702")
  ) %>% 
  style(
    traces = 1,
    hovertemplate = paste(
      "<b>%{text}</b><br>",
      "Degree pct: %{x:.1f}%<br>",
      "APC: %{y:.3f}<extra></extra>"
    )
  ) %>%
  layout(
    title = list(
      text = "Antidepressant prescribing vs degree-level education<br>
              <sup>Scottish Health Boards, Jan–Jun 2025</sup>"
    ),
    xaxis = list(title = "Degree-level qualification (%)"),
    yaxis = list(title = "Prescriptions per capita")
  )
graph2

Results

A surface level interpretation of this graph says that there is a strong negative correlation between the variables: health boards with a higher percentage of degree holders have a lower antidepressant prescribing rate. To take just the extremes, Dumfries and Galloway with the highest prescribing rate sits at the far left of the graph, among the bottom health boards in degree attainment, while Lothian has the lowest prescribing rate and is directly juxtaposed, with the highest degree attainment by a large margin. In isolation, this is a strong point backing the negative correlation.

However, confidence intervals are wide, and the majority of the health boards are distributed quite close to each other in education level, while still displaying a range of prescription rates, detracting from the credibility of the suggested relationship.

It is clear there is variation in the variables, and possibly a relationship between them, but the geographical stratification is too large at the level of health boards. To try and get a clearer image of the relationship, let’s do the same analysis, but with Scottish data zones, focusing in on much smaller localities.

Part 4: Focusing in

The original prescribing data already had GP practice codes, so we can map the prescriptions to data zones using a practice lookup data set from Public Health Scotland. After that we’ll do the same as we did to the prescribing data before, finding the total prescriptions per month and then averaging, just this time by data zone instead of health board, and then adding on the census data for education, again using the flexible table builder with data zones instead of health boards.

GP practices and list sizes October 2025 available from: https://www.opendata.nhs.scot/dataset/gp-practice-contact-details-and-list-sizes/resource/47557411-7eda-4278-9d6d-d26ed2ceab5a?inner_span=True

## load in practice details
practice_info <- read_csv("https://www.opendata.nhs.scot/dataset/f23655c3-6e23-4103-a511-a80d998adb90/resource/47557411-7eda-4278-9d6d-d26ed2ceab5a/download/practice_contact_details_20251001_opendata.csv") %>% clean_names()

antidepressants_practice <- antidepressants %>%
  inner_join(practice_info, by = c("gp_practice" = "practice_code")) %>% 
  select(c(gp_practice, number_of_paid_items, month, data_zone)) %>% 
  group_by(data_zone, month) %>% 
  summarise(total_items = sum(number_of_paid_items, na.rm = TRUE), .groups = "drop") %>%  # total prescriptions per month for each data zone
  group_by(data_zone) %>% 
  summarise(mean_items = mean(total_items, na.rm = TRUE), .groups = "drop") # average monthly prescriptions from Jan-Aug 2025 for each data zone

raw_lines_dz <- read_lines(here("data/census_education_dz.csv"))
dz_rows <- grep('^"S\\d{8}"', raw_lines_dz) # use regex to find just the rows with the data we need
core_text_dz <- c(raw_lines_dz[10], raw_lines_dz[dz_rows]) # combine header row with the data
census_dz <- read_csv(
  I(core_text_dz), # character vector fed in
  show_col_types = FALSE) %>%
  clean_names() %>%
  rename(data_zone = highest_level_of_qualification) %>%
  select(data_zone, degree_level_qualifications_or_above_education_qualifications_not_already_mentioned_including_foreign_qualifications, total) # only need these columns for investigation

antidepressants_education_dz <- inner_join(antidepressants_practice, census_dz)

Now, we can just add the antidepressants per capita and degree holder % to the data set and we can get a similar linear regression to what we had above, with data zones this time.

antidepressants_education_dz <- antidepressants_education_dz %>% 
  mutate(apc = mean_items/total, # antidepressants per capita
         degree_pct = (degree_level_qualifications_or_above_education_qualifications_not_already_mentioned_including_foreign_qualifications/total)*100) # degree holder percentage for each data zone

graph3 <- ggplot(antidepressants_education_dz, aes(x = degree_pct, y = apc)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", se = TRUE) +
  labs(
    x = "Degree holders (%)",
    y = "Antidepressant prescriptions per capita",
    title = "Association between education level and antidepressant prescribing by Scottish Data Zone",
    subtitle = "Prescribing data from January to August 2025"
  )
ggplotly(graph3, tooltip = c("x", "y"))

This graph communicates a similar relationship as the data on health boards: that a higher educational attainment is negatively correlated with antidepressant prescribing. Again though, the data doesn’t confidently support the suggestion, with only a weakly negative slope. Most of the data zones are distributed around 0-3 antidepressants per capita, with a select few outliers near 10 and even one above 15. These few data zones with abnormally high prescribing rates are on the lower end in educational attainment, but this doesn’t reflect a cut and dry, certain relationship.

Next Steps

Despite the uncertainty of the visualisations produced, there are still valid conclusions to be drawn from this exploration.

It is clear that there are localities with significantly high antidepressant prescribing rates relative to the rest of the country, and it’s unlikely to be a coincidence that these are areas with a relatively low education attainment rate. It would be valuable to further investigate just these outlying data zones, to identify disparities in other characteristics of the area like deficient/harmful infrastructure or local policy, insufficient access to support schemes, or disadvantageous environmental factors among many others. These are all factors that would impact both education attainment and antidepressant use, as the two are inextricably linked and involved in constant, lifelong interactions with personal, communal, and systemic aspects of being.

  • table of contents with headings for each figure/visualisation
  • explore dosages?
  • could find geometric dataset for healthboards in scotland, then colour code the prescriptions to population ratio
  • warning in census data read (line 100)
  • chunk titles
  • table of contents with navigation, headings to split the report up a bit(may need to revise flow somewhat)
  • gt table somewhere maybe
  • look over joins
  • do i have to remove outlier in final graph?
  • do i have to set a seed for linear regression?
  • gen ai acknowledgement

References

  1. Bergvall, S., Fernström, C., Ranehill, E., & Sandberg, A. (2025). The impact of PhD studies on mental health-a longitudinal population study. Journal of health economics, 104, 103070. Advance online publication. https://doi.org/10.1016/j.jhealeco.2025.103070
  2. Maguire, A., Moriarty, J., O’Reilly, D. et al. (2017). Education as a predictor of antidepressant and anxiolytic medication use after bereavement: a population-based record linkage study. Qual Life Res 26, 1251–1262. https://doi.org/10.1007/s11136-016-1440-1
  3. NHS Business Services Authority. Electronic Repeat Dispensing Handbook. 2020. Available at: https://www.nhsbsa.nhs.uk/sites/default/files/2020-07/Electronic%20Dispensing%20Handbook_Digital_WEB_S-1589995676.pdf [Accessed 21 Nov 2025].
  4. Hertfordshire & West Essex Integrated Care Board. (n.d.) Prescription duration guidance. Available at: https://www.hweclinicalguidance.nhs.uk/prescribing-guidance/prescription-duration-guidance-hwe-icb/ [Accessed 21 Nov 2025].